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To identify candidate genes involved in Arabidopsis flavonoid biosynthesis, we applied transcriptome 
coexpression analysis and independent component analyses with 1388 microarray data from publicly available 
databases. Two glycosyltransferases, UGT79B1 and UGT84A2 were found to cluster with anthocyanin 
biosynthetic genes. Anthocyanin was drastically reduced in ugt79b1 knockout mutants. Recombinant 
UGT79B1 protein converted cyanidin 3-0-glucoside to cyanidin 3-0-xylosyl(1 2)glucoside. UGT79B1 
recognized 3-0-glucosylated anthocyanidins/flavonols and uridine diphosphate (UDP)-xylose, but not 3,5-0- 
diglucosylated anthocyanidins, indicating that UGT79B1 encodes anthocyanin 3-0-glucoside: 2"-0-xylosyl- 
transferase. UGT84A2 is known to encode sinapic acid: UDP-glucosyltransferase. In ugt84a2 knockout 
mutants, a major sinapoylated anthocyanin was drastically reduced. A comparison of anthocyanin profiles in 
ugt84a knockout mutants indicated that UGT84A2 plays a major role in sinapoylation of anthocyanin, and that 
other UGT84AS contribute the production of 1-0-sinapoylglucose to a lesser extent. These data suggest major 
routes from cyanidin 3-0-glucoside to the most highly modified cyanidin in the potential intricate anthocyanin 
modification pathways in Arabidopsis. 
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SUMMARY 



INTRODUCTION 



One of the goals of plant secondary metabolism research is 
to formulate a comprehensive understanding of gene func- 
tions in a particular synthetic pathway, including regulation 
and crosstalk with other metabolic processes and/or 
metabolites. However, the enzymes involved in secondary 
metabolism can be encoded by multigene families, making 
it difficult to determine their precise physiological functions 
(D'Auria and Gershenzon, 2005; Bowles et al., 2006). Com- 
pletion of several plant genome sequencing projects has 



made cataloging a finite number of genes possible, as well 
as the development of multiple databases and bioresources 
for transcriptomes, proteomes, metabolomes and phe- 
nomes (Yonekura-Sakakibara and Saito, 2009; Mochida and 
Shinozaki, 2010). These 'omics platforms also provide the 
tools for genome-wide approaches based on sequence 
similarity, transcriptomics, and correlations between tran- 
scripts and metabolites in addition to the more traditional 
biochemical and reverse genetics approaches (Fridman and 
Pichersky, 2005; Moreno-Risueno eta/., 2010; Saito and 
Matsuda, 2010). These strategies facilitate an efficient nar- 
rowing-down of candidate genes involved in pathways of 
interest. 
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Flavonoids including tine antliocyanins, flavonols, and 
flavones are some of the most intensely studied secondary 
metabolites with over 7000 known structures (Harborne and 
Baxter, 1999; Anderson and Markham, 2006). IVIany of them 
are important as flower pigments, UV-B protectants, signal- 
ing molecules between plants and microbes, and regulators 
of auxin transport (Dooner eta/., 1991; Dixon and Paiva, 
1995). Biosynthetic pathways leading to the flavonoid agly- 
cones have been well-studied, and the corresponding 
regulatory and synthesis genes have been characterized in 
various plants (Davies and Schwinn, 2006; Tanaka et al., 
2008). However, the pathways for sequential modification, 
such as glycosylation, acylation, and methylation, are still 
relatively unexplored, even although these modifications 
produce enormous chemical diversity and are essential for 
the stable accumulation of flavonoids. 

Arabidopsis thaliana is one of the most studied plants for 
the molecular biology of flavonoid metabolism. The largest 
number of flavonoid-related genes including transcription 
factors have been identified from Arabidopsis because of the 
extensive 'omics'-based information and bioresources avail- 
able for this species. In Arabidopsis, flavonoid skeleton 
biosynthetic genes have been isolated on the basis of 
similarity or mutant phenotypes (Feinbaum and Ausubel, 
1988; Shirley ef al., 1992; Pelletier and Shirley, 1996; Pelletier 
et al., 1997; Kitamura et al., 2004). The genes involved in 
modification of flavonols and anthocyanins, flavonol 
3-0-rhamnosyltransferase, flavonol 7-0-glucosyltransferase 
and anthocyanin sinapoyltransferase, have also been iden- 
tified by genome-wide methods based on similarity (Jones 
et al., 2003; Fraser ef al., 2007). Genes involved in flavonoid 
biosynthesis are in general coordinately expressed. Two 
genes encoding flavonoid glycosyltransferases, flavonoid 
3-0-glucosyltransferase and anthocyanin 5-0-glucosyl- 
transferase, were identified by transcriptome analysis of 
A. thaliana overexpressing PAP1, a transcription factor for 
anthocyanin biosynthesis (Tohge et al., 2005). Furthermore, 
two genes encoding flavonol glycosyltransferases, flavonol 
7-0-rhamnosyltransferase and flavonol 3-0-arabinosyl- 
transferase, were efficiently targeted from among over 100 
candidate UGTs by transcriptome coexpression analyses 
using correlation coefficients which had been calculated 
based on publicly available transcriptome data (Yonekura- 
Sakakibara ef al., 2007, 2008). Functional identification of six 
kinds of flavonoid glycosyltransferases in one plant species 
allowed us to expand the search for substrate specificity, 
regiospecificity and evolutionary processes. However, even 
in this highly studied species, its flavonoid structures 
suggest that there are as yet identified genes encoding 
modification enzymes. 

To identify more candidate genes involved in flavonoid 
biosynthesis, we used independent component analysis 
(ICA) in addition to transcriptome coexpression analyses 
using correlation coefficients. ICA, a form of unsupervised 



algorithm, has been used as an effective analytical tool for 
microarray gene expression data (Kong ef a/., 2008). Origi- 
nally ICA was developed as a method for multi-channel 
signal processing to separate mixed signals into their 
different sources. In the gene expression context, ICA has 
been applied to extract and characterize the informative 
features of biological signals from microarray data on the 
assumption that the level of gene expression is determined 
by a linear combination of some independent components 
corresponding to biological signals. Precise gene clustering 
and classification was achieved by ICA on yeast microarray 
data during sporulation (Hori et al., 2001) and during the 
cell replication cycle (Liebermeister, 2002). ICA can be also 
used for screening genes involved in oncogenesis and in 
Alzheimer's disease (Chiappetta et al., 2004; Saidi et al., 
2004; Frigyesi et al., 2006; Kong et al., 2009). The genes 
involved in the biosynthesis of anthocyanins were catego- 
rized by ICA into two clusters for anthocyanin skeleton 
biosynthesis and modification. Two glycosyltransferases, 
UGT79B1 and UGT84A2, were predicted to be anthocyanin 
biosynthetic genes by both transcriptome coexpression 
analyses using correlation coefficients and ICA, and by ICA 
only, respectively. Analyses of anthocyanin profiles in 
knockout mutants and recombinant protein assays demon- 
strated that UGT79B1 encodes anthocyanin 3-0-glucoside: 
2"-0-xylosyltransferase and that, of the four UGT84As, 
UGT84A2 plays the major role in sinapoylation of antho- 
cyanins. When considered in the context of the previously 
known genes in anthocyanin biosynthetic pathway, we have 
now assembled a 'roadmap' for the major anthocyanin 
modification routes in Arabidopsis. 

RESULTS 

Anthocyanin UGT candidates deduced by independent 
component analysis and transcriptome coexpression 
analysis 

Previously, we conducted transcriptome coexpression 
analyses using correlation coefficients to identify all of the 
flavonoid-related genes and the relationship between 
flavonoid synthesis and other metabolic pathways. This 
examination of all Arabidopsis genes was performed using 
known 24 genes encoding flavonoid biosynthetic enzymes 
and transcription factors as query sequences (Yonekura- 
Sakakibara ef al., 2008). Twenty-four genes had two or more 
positive correlation (r> 0.525) with known flavonoid path- 
way-related genes. Over 100 genes also showed some cor- 
relation with one of the known flavonoid pathway-related 
genes. To shed new light on the transcriptome data and to 
identify previously undetected genes in the flavonoid 
biosynthetic pathway, ICA of a total of 1877 genes, including 
the flavonoid biosynthetic genes and all genes annotated in 
AraCyc, was performed on 1388 microarray data with 
ATTED-II (http://atted.jp/, ver.3) (Obayashi et al., 2009) using 
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the fastICA algorithm (Hyvarinen and Oja, 1997) (Figure SI, 
Data SI). A hierarchical cluster analysis of gene signature 
matrix S based on eight independent components (ICs) 
indicates that the genes involved in the biosynthesis of an- 
thocyanins and flavonols form distinct clusters with a few 
exceptions. Anthocyanin biosynthetic genes form a cluster 
with two sub-clusters, one of which contains dihydroflavo- 
nol reductase (DFR), anthocyanidin synthase (ANS) and 
glutathione-S-transferase (GST), and is involved in antho- 
cyanidin skeleton biosynthesis (sub-cluster 2, Figure la). 
The other contains anthocyanin 5-0-glucosyltransferase 
(At4g14090), anthocyanin 3-0-glucoside: 6"-0-p-couma- 
royltransferase (At1g03495), anthocyanin 5-0-glucoside: 6"- 
0-malonyltransferase (At3g29590), and the anthocyanin- 
specific MYB transcription factors PAP1 (At1g56650) and 
PAP2 (At1g66390) (sub-cluster 1; Figure 1A). These data 
suggest that ICA can be used to classify genes into biologi- 
cally meaningful clusters. In addition to known anthocyanin- 
related genes, UGT79B1 (At5g54060) and a 2-oxoacid 
dehydrogenase family protein (At5g55070) also belong to 
the anthocyanin modification sub-cluster. UGT84A2 
(At3g21560) is part of the anthocyanin biosynthesis sub- 
cluster. Of these three candidate genes, UGT79B1 showed 
high correlation (Yonekura-Sakakibara ef al., 2008) and 
similar expression profile with known anthocyanin-specific 
biosynthetic genes (Solfanelli eta/., 2006; Kusano et al., 
2011). However, genes encoding 2-oxoacid dehydrogenase 
family protein and UGT84A2 did not correlate significantly 
with the anthocyanin-specific biosynthetic genes. By 
transcriptome coexpression analyses using correlation 
coefficients, UGT84A2 had relatively higher correlation 
coefficients with flavonol-specific biosynthetic genes 



(At3g55120, chalcone isomerase, r= 0.521; At3g51240, 
flavanone 3-hydroxylase, r= 0.515; At1g06000, flavonol 7- 
0-rhamnosyltransferase, r= 0.488; At5g08640, flavonol 
synthase, r= 0.425; At5g13930, chalcone synthase, 
r= 0.424; ATTED-II all data ver. 3) than with anthocyanin- 
specific genes (At5g42800, DFR, r= 0.387; At5g17220, GST, 
r = 0.378; At4g22880, ANS, r = 0.369; ATTED-II all data ver. 
3) although the slight but reproducible induction of 
UGT84A2 by PAP1 over-expression suggests that UGT84A2 
may be more distantly regulated by PAP1 (Tohge ef al., 
2005). 

Flavonol biosynthetic genes fall into two sub-clusters, but 
there is no clear separation between skeleton biosynthesis 
and modification (Figure IB). Genes encoding peroxisomal 
3-keto-acyl-CoA thiolase (At5g48880), pyrophosphorylase 
(At2g18230), arogenate dehydratase (At1g08250), cinna- 
moyl-CoA reductase like protein (At4g30470) and choris- 
mate mutase (At3g29200) were found in the flavonoid 
cluster. Of these five candidate genes, only At5g48880 
showed high correlation with known flavonoid-related 
genes (Yonekura-Sakakibara et al., 2008). Thus, ICA identi- 
fied additional candidate genes that are different from those 
tagged by transcriptome coexpression analyses based on 
simple correlation coefficients. We focused on the anthocy- 
anin-related candidate genes, UGT79B1 and UGT84A2, 
because ICA based on eight ICs is apparently more suitable 
for analysis of the anthocyanin pathway. 

UGT79B1 belongs to the subfamily of UGTs catalyzing 
glycosylation at the sugar moiety of flavonoid glycosides 

The flavonoid UGT phylogenetic tree indicates that 
UGT79B1 belongs to a cluster in which UGTs transfer a 




Al5g55070 2-oxoacid dehydrogenase family protein 

At4g14090 UGT75C1, anlhocynain 5-Oglucosyltransferase 

A15g54060 UGT79B1 

At1g03495 Anthocynain coumaroyltransferase 

At3g29590 Anthocynain malonylyltransierase 

A11g56650 AtMYB75, PAP1 

A11g66390 AtMYB90, PAP2 ; 

At5g17220 GlutalhioneS-lransferase, TT19 

At5g42800 Dihydroflavonoi 4-reduclase, TT3 

At4g22880 Anthocyanidin synthase, TT1 8 

A13g21560 UGT84A2 



- Sub-cluster 1 



- Sub-cluster2 



Figure 1. Major sub-clusters including anthocy- 
anin and flavonol-related genes formed by hier- 
archical clustering of genes based on ICA. 

(A) A major sub-cluster for anthocyanin-related 
genes. The genes characterized in this study are 
shown in red. 

(B) A major sub-cluster for flavonol-related 
genes. 
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VhSGIcT 

At7RhaT 
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-AchA3Ga2"XylT 
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Figure 2. Non-rooted molecular phylogenetic tree of the flavonoid glyco- 
syltransferases. 

The tree was constructed as described in Experimental procedures. The 
alignment used for this analysis is available in the supplementary material 
online (Data S2). Bar = 0.1 amino acid substitutions per site. The GenBank 
accession numbers for the sequences are shown in parentheses: AtSRhaT 
(NM_102790); AtSGIcT (NM_1217n); AtSAraT (NM_121709); VvSGIcT 
(AF000371); PhSGalT (AF316552); PhSGIcT (AB027454); PfSGIcT (AB002818); 
HvSGIcT (X15694); ZmSGIcT (X13501); AtBGIcT (NMJ 17485); Pf5GlcT 
(AB013596); VhBGIcT (AB013598); PhBGIcT (AB027455); At7GlcT (NM_ 
129234); At7RhaT (NM_100480); DbBBGIcT (Y18871); NtlSBa (AF346431); 
Gt3'GlcT (AB076697); CmF7G2"RhaT (AY048882); BpA3G2"GlcAT (AB190262); 
AcA3Ga2"XylT (FG404013); UGT79B1 (NM_124785); lpA3G2"GlcT (AB192315); 
PhA3G2"RhaT (Z25802); UGT84A2 (AY090952); UGT84A1 (BT002014). 
3GlcT, flavonoid 3-0-glucosyltransferase; 3'GlcT, flavonoid 3'-0-glucosyltrans- 
ferase; 3GalT, flavonoid 3-0-galactosyltransferase; 3RhaT, flavonol 
3-0-rhamnosyltransferase; 5GlcT, flavonoid 5-0-glucosyltransferase; 7GlcT, 
flavonol 7-0-glucosyltransferase; NtlS5a, salicylate-induced glucosyltransfer- 
ase. 

Abbreviations for species: Ac, Actinidia chinensis; At, Arabidopsis ttiaiiana; Bp, 
Bellis perennis; Cm, Citrus maxima; Db, Dorotlieantiius bellidiformis; Gt, 
Gentiana triflora; Hv, Hordeum vulgare; Ip, Ipomoea purpurea; Nt, Nicotiana 
tabacum; Pf, Perilla frutescens; Ph, Petunia hybrida; Vh, Verbena hybrida; Vv, 
Vitis vinifera; Zm, Zea mays. 



glycosyl group to a sugar moiety of flavonoid glycosides 
(Figure 2). UGT79B1 has some amino acid sequence identity 
with lpA3G2"GlcT (48%), AcA3Ga2"XylT (48%), Ph3G2"RhaT 
(37%), BpA3G2"GlcAT (28%), and CmF7G2"RhaT (27%), all 
of which are known to catalyze glycosyl transfer to a sugar 
moiety of flavonoid glycosides (GGT) (Bar-Peled ef a/., 1991; 
Brugliera ef a/., 1994; Kroon ef a/., 1994; Morita ef a/., 2005; 
Sawada ef a/., 2005; Montefiori ef a/., 2011). The high cor- 
relation coefficients with anthocyanin biosynthetic genes, 
the anthocyanin substituted patterns found in Arabidopsis 
(Bloor and Abrahams, 2002) and the primary sequence 
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Figure 3. Ds transposon insertion mutants of UGT79B1. 

(A) Schematic representation of 0677967 with two Ds transposon insertion 
mutations used in this work. The thick line indicates the coding region and the 
thinner line indicates the 5'- and 3'-untranslated regions. UGT79B1 has no 
introns. White and black triangles show the H-edge and G-edge of the Ds 
transposon, respectively. Numbers indicate the position of the transposon 
insertion. 

(B) Reverse transcription polymerase chain reaction (RT-PCR) analysis of 
transcripts in wild-type (F-Nos), Ds parental lines (Ds53 and Ds54) and two 
independent homozygous mutant lines {ugt79bl-l and ugt79bl-2). 

(C) Phenotype of the wild-type, F-Nos (a); Ds53 (b); Ds54 (c); ugt79bl-l (d); and 
ugt79bl-2 (e). Plants were grown in 12% Suc-containing medium as described 
in Experimental procedures. 



similarity of UGT79B1 to related genes suggest that 
UGT79B1 encodes anthocyanin 3-0-glucoside: 2"-0-xylo- 
syltransferase. 

Analysis of ugt79b1 transposon mutants indicates tliat 
UGT79B1 encodes anthocyanin 3-0-glucoside: 2"-0-xylo- 
syltransferase 

Homozygotes of two independent ugt79b1 Arabidopsis 
transposon insertion lines (Kuromori ef a/., 2004; Ito ef a/., 
2005), Ds53-4592-1 and Ds54-1263-1, were isolated and 
designated as ugt79b1-l and ugt79bl-2, respectively (Fig- 
ure 3). The transposon was inserted into the exon of 
UGT79B1 of both mutants but at the positions -1-557 
(ugt79b1-l), and between -1-6I and -1-86 base pairs {ugt79b1- 
2) (Figure 3A). No transcripts of UGT79B1 were detected by 
reverse-transcription polymerase chain reaction (RT-PCR) in 
homozygotes of either line (Figure 3B). Seedlings of 
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wild-type and Ds parental lines of ugt79bl-1 and ugt79b1-2 
accumulated anthocyanins when grown on 12% sucrose 
(Suc)-containing media. Homozygous insertion lines, 
ugt79b1-1 and ugt79b1-2, grown on 12% Suc-containing 
media lacked the purple-coloration phenotype (Figure 3C). 
The anthocyanin profiles of wild-type, Ds parental lines, 
ugt79b1-1 and ugt79bl-2 were analyzed by high perfor- 
mance liquid chromatography (HPLC)/photodiode array 
(PDA)/electrospray ionization mass spectrometry (ESI-MS) 
(Figure 4B). The major anthocyanin All was detected in 
wild-type and Ds parental lines, and A5, A8, A9 and A10 
were also detected as minor anthocyanins (Figure 4A). 
While A5, AS, A9, A10 and All were not detected in inser- 
tion lines, ugt79b1-1 and ugt79bl-2, total anthocyanin 
content was reduced to ca. 33% of wild-type. Instead, a 
compound which has an m/z value corresponding with 
cyanidin 3-0-(6"-0-p-coumaroylglucoside)-5-0-(6"-0-malo- 
nylglucoside) (m/z 843, RT 26.09 min) was detected as a 
major peak in addition to cyanidin 3-0-glucoside (m/z 449, 
RT 16.07 min). A peak with an m/z value corresponding to 
cyanidin 3-0-6"-0-p-coumaroylglucoside (m/z 595, RT 
29.26 min) was also found. ugt79b1-1 plants were trans- 
formed with UGT79B1 cDNA under the control of cauli- 



flower mosaic virus (CaMV) 35S promoter to complement 
the ugt79bl-1 mutation. Independent transgenic lines had 
essentially the same anthocyanin composition as wild-type 
and Ds parental plants. 

In vitro characterization of recombinant UGT79B1 

Recombinant UGT79B1 protein was expressed in Escheri- 
chia coli as a His-tag fused protein and purified. The 
His-UGT79B1 protein catalyzed the conversion of cyanidin 
3-0-glucoside to a single product, cyanidin 3-0-xylo- 
syl(1 2)glucoside (Figure 5) as confirmed by retention 
time and MS/MS spectra. The His-tag alone as a negative 
control did not catalyze conversion to the 2"-0-xyloside. 
Thus, UGT79B1 can be defined as an anthocyanin 3-0-glu- 
coside: 2"-0-xylosyltransferase. 

The specificity of UGT79B1 as a sugar acceptor was also 
examined. UGT79B1 had significant activity with the 
3-0-glucoside derivatives of anthocyanidins and flavonols 
(Table 1). Interestingly, UGT79B1 recognizes cyanidin/ 
kaempferol 3-0-rhamnosyl(1 6)glucoside but not cyani- 
din 3,5-0-diglucoside or kaempferol 3-0-glucoside- 
7-0-rhamnoside although the latter compounds appear to 
have more free space around the glucosyl moiety at the C-3 
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Figure 4. HPLC/PDA/MS analyses of the ugt79bl 
mutant lines. 

(A) Anthocyanin glycosides accumulated in Ara- 
bidopsis thaliana. (B) Anthocyanin composition 
of leaves of wild-type (F-Nos), Ds parental line 
(Ds 53), ugf79/)7-deficient mutant {ugt79bl-l], 
ugtZSfal-deficient mutant complemented with 
vector only (ugf79ii7-7/vector) and UGT79B1 
cDNA clone (ugf79fa 7- 7/L/Gr79e7).C3G, cyanidin 
3-0-glucoside; *a compound which has the m/z 
value correspond to cyanidin 3-0-(6"-0-p-cou- 
maroylglucoside)-5-0-(6"-0-malonylglucoside) 
(m/z 843, RT 26.09 min); **, a compound which 
has the m/z value correspond to cyanidin 3-0-6"- 
0-p-coumaroylglucoside (m/z 595, RT 
29.26 min). 
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Table 1 Substrate specificity of UGT79B1 from Arabidopsis ttialiana 
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Figures. HPLC analyses of the reaction products of UGT79B1 recombinant 
protein. 

Elution profile of reaction products of His-tag protein (empty vector) and His- 
fused UGT79B1 protein (1/677967) and the standards (cyanidin 3-0-glucoside 
and cyanidin 3-0-xylosyl(l -> 2)glucoside). C3G, cyanidin 3-0-glucoside, 
C3G2"X, cyanidin 3-0-xylosyl(l -> 2)glucoside. 



position. UGT79B1 can utilize flavonol 3-0-glucosides as 
substrates, but no flavonol 3-0-xylosylglucosides were 
detected in Arabidopsis seedlings grown in 12% Suc- 
containing medium. 

The sugar donor specificity of UGT79B1 was examined 
with UDP-xylose, UDP-glucose, UDP-arabinose, UDP-glu- 
cose, UDP-rhamnose, UDP-galactose and UDP-glucuronic 
acid as donors and cyanidin 3-0-glucoside as acceptor 
(Table 1). No UGT activity was detected for UDP-sugars 
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Sugar donor'' 


Relative activity (%) 


UDP-xylose 


100.0 ± 27.5 


UDP-arabinose 


ND 


UDP-galactose 


ND 


UDP-glucose 


ND 


UDP-glucuronic acid 


ND 


UDP-rhamnose 


ND 



ND, not detected. 

"The reactions were performed with UDP-xylose as the sugar donor. 
''The reactions were performed with cyanidin 3-0-glucoside as the 
sugar acceptor. 



other than UDP-xylose, indicating that UGT79B1 is highly 
specific to UDP-xylose. 

Phylogenetic analyses of the UGTs from six genome- 
sequenced plants 

To assess the origin of GGTs from a broader perspective, we 
conducted phylogenetic analyses of the UGTs from six 
genome-sequenced plants {Physcomitrella patens, Selagi- 
nella moellendorffii, Populus trictiocarpa, Oryza sativa, 
Arabidopsis ttiaiiana and A. lyrata) in addition to known 
flavonoid GGTs. Plant UGTs fell into 24 orthologous groups 
(OGs) that contained genes derived from a common ances- 
tor of these six species (Yonekura-Sakakibara and Hanada, 
2011). All known flavonoid GGTs belong to an orthologous 
group (OGS), suggesting that flavonoid GGTs are derived 
from a common ancestral gene (Figure 6). Unfortunately, 
the functions of other UGTs in this orthologous group have 
remained elusive. However, a furofuran lignan GGT 
(UGT94D1), which glucosylates at the 6'-hydroxyl group of 
the sugar moiety of (4-)-sesaminol 2-0-glucoside, have been 
isolated from Sesamum indicum (Noguchi et al., 2008). 
UGT94D1 cannot utilize flavonoid glycosides as substrates, 
but belongs to the same UGT94 family with BpA3G2"GlcAT. 
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Figure 6. Phylogenetic tree of flavonoid glycosyltransferases in GGT sub- 
family and UGTs in the orttiologous group GG8 from Physcomitrella patens, 
Selaginella moellendorffii, Popuius trichocarpa, Oryza sativa, Arabidopsis 
thaliana and A, lyrata. 

The phylogenetic tree for UGTs was generated as described previously 
(Yonekura-Sal<al<ibara and Hanada, 2011). Number indicates the percentage 
of replicate trees in which the associated taxa clustered together in the 
bootstrap test (1000 replicates). Some UGTs from the six plants in 0G8 w/ere 
omitted. A3G2"GlcAT, anthocyanidin 3-0-glucoside 2"-0-glucuronosyltrans- 
ferase; A3G2"GlcT, anthocyanidin 3-0-glucoside 2"-0-glucosyltransferase; 
A3G6"RhaT, anthocyanidin 3-0-glucoside 6"-0-rhamnosyltransferase; 
A3Ga2"XylT, anthocyanidin 3-0-galactoside 2"-0-xylosyltransferase; 
F7G2"RhaT, flavanone 7-0-glucoside 2"-0- rhamnosyltransferase; Abbrevia- 
tions for species: Ac, Actinidia chinensis; At, Arabidopsis thaliana; Bp, Bellis 
perennis; Cm, Citrus maxima; Ip, Ipomoea purpurea; Ph, Petunia hybrida. 



This finding suggests that GGT function including a recog- 
nition mechanism for the hydroxyl group of the sugar moi- 
ety was established before the divergence of UGT94s, 
UGT91s and UGT79s and only then acquired the ability to 
specify substrates (sugar acceptors and sugar donors) and 
regiospecificity. Functional identification of other UGTs in 
0G8 will provide useful information for UGT evolution in 
terms of acquiring substrate specificity. 

UGT84A2 supplies sinapoylglucose for anthocyanin 
modification 

By the hierarchical clustering of genes based on ICA, 
UGT84A2 was localized to sub-cluster 2, which is involved in 
anthocyanidin skeleton biosynthesis. It has been reported 
that UGT84A2 encodes UDP-glucose: sinapic acid glucosyl- 
transferase required for the biosynthesis of 1-0-sinapoyl- 
glucose (Lim et al., 2001; Sinlapadech et al., 2007). 
1-0-sinapoylglucose is utilized as an acyl donor by serine 
carboxypeptidase-like acyltransferases including malate 
sinapoyltransferase and choline sinapoyltransferase 
(Lehfeldt ef al., 2000; Shirley ef al., 2001). In ugt84a2 knock- 
out mutants, sinapoylmalate content was slightly decreased 
and other sinapoylated/feruloyated compound contents 
were altered compared to wild-type (Sinlapadech ef al., 
2007; Meissner ef al., 2008). Anthocyanin sinapoyl- 
transferase (SAT, At2g23000) also belong to serine 
carboxypeptidase-like acyltransferases, suggesting that 
1-0-sinapoylglucose produced by UGT84A2 may be a major 
source of sinapoyi groups to anthocyanin sinapoyltransfer- 
ase. However, there is no direct evidence for a relationship 
between anthocyanin biosynthesis and UGT84A2 in planta. 

Analysis of ugt84a mutants supports tlie predominant 
involvement of UGT84A2 in antliocyanin acylation 

A homozygote of an ecotype Nossen transposon insertion 
line, Dsl 1-5836-1, was isolated and designated as ugt84a2-l. 
The G-edge of the transposon was inserted into the exon of 
the mutant at position -1-266 (Figure 7A), creating a null 
mutation (Figure 7B). When grown on 12% Suc-containing 
medium, ugt84a2-1 seedling pigment phenotype was the 
same as the Ds parental and wild-type lines (Figure 7C). 

The anthocyanins of wild-type, Ds parental line Dsl 1 and 
ugt84a2-1 were analyzed by HPLC/PDA/ESI-MS (Figure 8). In 
wild-type and Ds parental lines, Al 1 was detected as a major 
anthocyanin (>50% of total anthocyanins), and A5, A8, A9 
and A10 were detected as minor peaks (5-20% of total 
anthocyanins). In ugt84a2-1, A^^ made up only about 20% of 
the total anthocyanins and A5 accounted for 40-50%, 
although there is no significant change in total anthocyanin 
contents. The Al 1 levels of ugt84a2 knockout mutants were 
reduced to approximately 25% of wild-type. Compared with 
the impact on other sinapoylated compounds such as 
sinapoylmalate and sinapoylcholine (60-70% of wild-type) 
(Sinlapadech ef al., 2007; Meissner ef al., 2008), the effect on 
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Figure 7. ugt84a mutants and characterization of ugt84a mutations on 
anthocyanin accumulation. 

(A) Schematic representation of UGT84A2 with a Ds transposon insertion 
mutation and UGT84A1 with a T-DNA insertion mutation used in this worl<. 
The thick line indicates the coding region and the thinner line indicates the 
5'- and 3'-untranslated regions. UGT84A2and UGrS447 have no introns. Gray 
triangle shows the G-edges of the Ds transposon. Black and white triangles 
show right and left borders of T-DNA, respectively. Numbers indicate the 
position of the insertion. 

(B) RT-PCR analysis of transcripts in wild-type (F-Nos), Ds parental line (Dsl 1) 
and an independent homozygous mutant line (ugt84a2-l) and the ugt84a2- 
deficient mutant complemented with the UGT84A2 cDNA clone (ugt84a2-V 
UGT84A2). 

(C) Phenotype of the wild-type, F-Nos (a); Ds 11 (b); ugt84a2-1 (c); Col-0 (d); 
ugt84a2-2 (e); and ugt84a1-1 (f). Plants were grown in 12% Suc-containing 
medium as described in Experimental procedures. 



All content was more severe. Other sinapoylated antho- 
cyanins (A9 and A10) were largely unaffected. To determine 
if the changes in anthocyanin composition can be ascribed 
to the UGT84A2 mutation, ugt84a2-1 plants were trans- 
formed with UGT84A2 cDNA under the control of CaMV 35S 
promoter. Independent complemented L/G7S4/42transgenic 
lines had substantially the same anthocyanin composition 



as wild-type and Ds parental plants (Figure 8A). These data 
indicate that 1-0-sinapoylglucose produced by UGT84A2 is 
a significant source of sinapoyi moieties for anthocyanins, 
and that the limited supply of 1-0-sinapoylglucose affects 
anthocyanin composition, reducing the content of sinapoy- 
lated anthocyanin. All. 

UGT84A2 and closely-related UGTs (UGT84A1 and 
UGT84A3) have significant UDP-glucose: sinapic acid gluco- 
syltransferase activity, but the specific activities of both 
UGT84A1 and UGT84A3 for sinapic acid are nearly half that of 
UGT84A2 (Lim ef a/., 2001). L/G7m2 showed the highest 
correlation coefficients with UGT84A1 (r= 0.403, ATTED-II, 
ver.3) among UGT84As {UGT84A3, r= 0.077, UGT84A4, 
r = 0.144). To investigate the contribution of other UGT84As 
to anthocyanin composition, we also isolated a homozygous 
T-DNA insertion mutant of UGT84A1 (GABI_765F10) desig- 
nated as usffS4a7-7 (Figure 7A). To compare phenotypes, we 
used another knockout mutant of UGT84A2 with a Col-0 
background, brt1-1 (ugt84a2-2), because L/gf84a7-7 hasa Col- 
0 background. When grown on 12% Suc-containing media, 
ugt84a1-1 seedlings showed a purple-color phenotype that 
was indistinguishable from wild-type (Col-0) and brt1-1 
(Figure 7C). We analyzed the anthocyanin profiles of 
ugt84a1-1 and brtl-1 {ugt84a2-2) (Figure 8B). The brt1-1 
(ugt84a2-2) mutant in a Col-0 background had a similar 
anthocyanin profile to ugt84a2-l. However, no significant 
change of the anthocyanin profile in ugt84a1-l\Nas observed 
compared with wild-type (Col-0). These data indicate that 
UGT84A2 is a major supplier of 1-0-sinapoylglucose for 
anthocyanin modification. The predominance of All in 
i/gf84a2 knockout mutants from different backgrounds sug- 
gests that other UGT84As also contribute the production of 
1-0-sinapoylglucose, but to a much lesser extent. 

DISCUSSION 

Flavonold UGTs which glycosylate the sugar moiety 
attached to flavonoid aglycones 

The functional identification of UGT79B1 allowed us to 
compare flavonoid GGTs that glycosylate the sugar moiety 
attached to flavonoid aglycones. Generally, flavonoid UGTs 
form a unique cluster based on their regiospecificity for 
sugar acceptors (i.e. the glycosylation position of sugar 
acceptors). Furthermore, in the case of UGT, which 
glycosylates at the C-3 position of flavonoids (3GT), the 
phylogenetic tree indicates that the function of 3GT was 
established before the divergence of monocots and dicots, 
and the specificity of sugar donors was afterward (Figure 2). 
However, no such systematic phylogenetic trace was found 
in the GGTs. lpA3G2"GlcT (UGT79G16) is distant from 
Ph3G6"RhaT although they are from the same order (Sola- 
nales). Anthocyanin 3-0-galactoside: 2"-0-xylosyltransfer- 
ase from kiwifruit (Actinidia chinensis, AcA3Ga2"XylT), 
recognizes cyanidin 3-0-glucoside and UDP-xylose like 
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Figure 8. HPLC/PDA/MS analyses of the ugt84a2 
mutant lines. 

(A) Antiiocyanin composition of leaves of wild- 
type (F-Nos), Ds parental line (Ds 11), ugt84a2- 
deficient mutant {ugt84a2-1), ugfS4a2-deficient 
mutant complemented witti vector only 
(ugfS4a2-7/vector) and UGT84A2 cDNA clone 
{ugt84a2-VUGT84A2). 

(B) Anthocyanin composition of leaves of wild- 
type (Col-0), ugfS4a2-deficient mutant {brtl-l, 
ugt84a2-2) and ugfS4a7-deficient mutant 
{ugt84al-l). 
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UGT79B1 does. However, AcA3Ga2"XylT had higher simi- 
larity with lpA3G2"GlcT, which uses UDP-glucose as a sugar 
donor (63%), than with UGT79B1 (48%). Plant UGTs have a 
carboxyl-ternninal consensus sequence of 44 annino acid 
residues termed the plant secondary product glycosyl- 
transferase (PSPG) box (Mackenzie et al., 1997; Paquette 
ef al., 2009). The PSPG box is thought to be involved in 
binding to the UDP moiety of the sugar nucleotide (Mac- 
kenzie etal., 1997). The PSPG box of UGT79B1 showed 
higher sequence identity with that of lpA3G2"GlcT (68%) 
than with that of AcA3Ga2"XylT (59%), although both 
UGT79B1 and AcA3Ga2"XylT recognize UDP-xylose. 

Phylogenetic comparisons of flavonoid GGTs, including 
predicted common ancestral UGTs at nodes 7-10 suggest 
possible conserved amino acid residues involved in recog- 
nizing UDP-xylose and anthocyanin 3-0-glucosides 
(Figure S2). Ancestral sequences for four nodes (node 7 to 
node 10) were inferred by a maximum likelihood method 
(PAML: Phylogenetic Analysis by Maximum Likelihood, 
ver. 4.3). We assumed that the occurrence of enzymatic 
divergence was due to amino acid replacements, and 
ancestral GGT(s) obtained the ability to recognize UDP- 
xylose at the branch of node7 to node 8, and was preserved 



in the two branches (node 8 to node 9, and node 9 to 
AcA3Ga2"XylT), but was lost at node 9 to lp3G2"GlcT. 
Following this simple assumption, amino acid residues 
involved in recognition of the C-2" position of anthocyanin 
3-0-glucoside are expected to be conserved in all diver- 
gences except for two branches (node 10 to BpA3G2"GlcAT 
and node 7 to PhA3G6"RhaT). The multiple alignment of 
flavonoid GGTs shows that most amino acid residues 
conserved in flavonoid GGTs are common to other known 
flavonoid UGTs, and Met16 is clearly specific to flavonoid 
GGTs (Figure S2). In general, UGTs belong to the GT-B fold 
with two Rossmann-like domains (Coutinho etal., 2003), 
and the crystal structures of plant UGTs (grape VvGTI, 
UGT71G1 and UGT85H2 from Medicago truncatula) have 
been determined (Shao etal., 2005; Often etal., 2006; 
Li etal., 2007). Protein modeling and site-directed muta- 
genesis of BpA3G2"GlcAT suggest that N123 and D152 are 
key residues for recognition of cyanidin 3-0-glucoside 
(Osmani ef al., 2008). However, the residues are not 
conserved in other flavonoid GGTs. Crystallization of 
UGT79B1 would be required for precise determination of 
the amino acid residues involved in substrate recognition 
because of low sequence identity between plant UGTs. 
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Anthocyanin modification patliway forms metabolic 
grids in Arabidopsis 

In Arabidopsis, it has been estimated tliat the anthocyanin 
modification pathway forms a metabolic grid as proposed 
for Perilla frutescens and Gentiana triflora (Yamazal<i ef a/., 
1999; Ful<uchi-IVlizutani et al., 2003; Fraser ef a/., 2007). 
Enzymatic characterization of anthocyanin xylosyltransfer- 
ase and of the anthocyanin profile in ugt84a2, together with 
previous reports, provides a clue as to the likely major routes 
of anthocyanin modification (Figure 9). 

UGT79B1 recognizes cyanidin 3-0-glucoside and cyanidin 
3-0-rhamnosyl(1 -> 6)glucoside as substrates, but not 
cyanidin 3,5-0-diglucoside. Other UGTs which catalyze 
glycosylation at the sugar moiety of flavonoid glycosides 
also show considerable activity toward anthocyanin 3-0- 
glucosides, but no or negligible activity toward anthocyanin 
3,5-0-diglucosides (Morita ef a/., 2005; Sawada ef al., 2005). 
Arabidopsis anthocyanin coumaroyltransferase, which 
transfers a coumaroyi moiety to the C-6" position of cyanidin 
3-0-glucoside prefers cyanidin 3-0-glucoside, but has neg- 
ligible activity for cyanidin 3,5-0-diglucosides (Luo ef al., 
2007). These data imply that xylosylation and coumaroyla- 
tion at C-2" and C-6" of cyanidin 3-0-glucoside, respectively. 



C3-6 




Figure 9. Possible metabolic route of anthocyanin modification pathway in 
Arabidopsis. 

The metabolic grid was constructed by expanding that of anthocyanin 
acylation in Fraser et al., 2007;. Thin black/gray arrows indicate potential 
glucosylation (Glc), xylosylation (Xyl), coumaroylation (Cou), sinapoylation 
(Sin) and malonylation (Mai). Gray thicl< arrows indicate putative major 
anthocyanin modification routes. Open dotted arrows mean unlikely routes 
based on the corresponding enzymatic property. The structures of anthocy- 
anins (A1-A11) are shown in Figure 5A. C3G; cyanidin3-0-glucosdie, C3G5G; 
cyanidin 3,5-0-diglucosides, C3G2"X; cyanidin 3-0-xylosyl(1 -> 2)glucoside. 



occur prior to glucosylation at the C-5 position. It has been 
reported that acylation with a coumaroyi moiety is effective 
for anthocyanin stability (Luo et al., 2007). On the other 
hand, acylated anthocyanin 3-0-glucosides are unstable, but 
are the best substrates for anthocyanin 3-0-glucoside 2"-0- 
xylosyltransferase in Matthiola incana R.Br (Teusch, 1986). 
Unfortunately, acylated anthocyanin 3-0-glucosides are not 
commercially available. The substantial anthocyanin reduc- 
tion observed in ugt79b1 mutants suggests that 2"-0- 
xylosylation is crucial for stable accumulation of anthocya- 
nin in Arabidopsis. 

In ugfS4a2 knockout mutants, A5 accumulated as a major 
anthocyanin instead of All. The content of sinapoylated 
anthocyanin A9 and A10, and the non-sinapoylated antho- 
cyanin A8 showed no significant change. This finding 
suggests that anthocyanin All may be mainly produced 
from A5 via A9. Interestingly, in an sng1-5 mutant that lacks 
anthocyanin sinapoyltransferase, AS and A5 accumulated as 
major anthocyanins (Fraser ef al., 2007). The differences in 
accumulated anthocyanins in these mutants suggest that 
the affinity of A5 for anthocyanin sinapoyltransferase may 
be higher than for anthocyanin coumaroyltransferase. 
Anthocyanin sinapoyltransferase may hold A5 in ugt84a2 
mutants and thus inhibit further modification. 

Independent component analysis provides a different 
perspective on microarray data analysis by IC numbers 

The hierarchical clustering of a gene signature matrix with a 
total of 1877 metabolism-related genes based on eight ICs 
formed clear clusters of genes involved in the biosynthesis 
of anthocyanins and flavonols, which are slightly different 
from those formed by transcriptome coexpression analysis. 
Among the algorithms with various numbers of ICs we 
applied, the closest anthocyanin/flavonol clusters were 
observed based on eight ICs. Fukushima ef al. proposed that 
a small number of samples (~20) are enough to find coex- 
pression linkage (Fukushima et al., 2008). Kinoshita and 
Obayashi examined principal component analysis (PCA) for 
identifying the major factors of gene expression correlation, 
and found the contribution of the first 10 principal compo- 
nents (PCs) to be enough to describe 80% of the variation 
within the 1388 samples of ATTED-II (Kinoshita and Obay- 
ashi, 2009). 

Interestingly, the cluster that flavonoid 3-0-glucosyltrans- 
ferase (Fd3GlcT; UGT78D3, At5g17050) belongs to changes 
depending on the number of ICs. For example, Fd3GlcT falls 
into the flavonol cluster based on 8 ICs, but into the 
anthocyanidin modification sub-cluster when based on 10 
ICs (data not shown). Further, using only simple correlation 
coefficients, Fd3GlcT, which can recognize both flavonols 
and anthocyanidins as substrates, localizes to a flavonol 
gene cluster, but not to an anthocyanin group (Yonekura- 
Sakakibara ef al., 2007). In addition, UGT84A2 also plays an 
important role for other sinapoyltransferases, but no 
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correlation with other sinapoyltransferase genes was found 
in this study. These data suggest that transcriptome analy- 
ses using ICA should specify optimum IC numbers for each 
metabolic pathway. Additionally, bi-functional genes may 
become apparent by fine adjustment of IC number. For 
example, eight ICs may be most suitable for analysis of 
anthocyanin pathway, as demonstrated by the dual role of 
flavonoid 3-0-glucosyltransferase. Recently, novel acyl-glu- 
cose-dependent anthocyanin glucosyltransferases, which 
belong to glycoside hydrolase family 1, but not UGT, were 
isolated from carnation {Dianthus caryophyllusj and del- 
phinium {Delphinium grandiflorum) (Matsuba et al., 2010). 
The application of ICA in various plant species thus might 
also be useful for finding 'unexpected' genes. 

EXPERIMENTAL PROCEDURES 
Plant materials 

A. thaliana accession Columbia-0 (Lehle Seeds), or accession 
Nossen (Fedoroff and Smith, 1993)1 were used as wild-types in this 
study. The mutant brtl-l (ugt84a2-2) was described previously 
(Sinlapadech et al., 2007). The Arabidopsis transposon-tagged lines 
Ds53-4592-1 and Ds54-1263-1 forUGT79B1 {ugt79b1-l and ugt79bl- 
2, respectively), and Ds11-5836-1 for UGT84A2 {ugt84a2-l) were 
obtained from RIKEN Bioresource Center. The T-DNA-insertion 
mutant GABI_765FW for UGT84A1 {ugt84a1-l) was obtained from 
the Arabidopsis Biological Resource Center. Homozygous knockout 
lines were screened by PCR using specific primers for UGT79B1, 
UGT84AI UGT84A2, Ds transposon and T-DNA: UGT79B1f, 
UGT79B1r, UGT84A1f, UGT84A1r, UGT84A2f, UGT84A2r, Ds5-2a, 
Ds3-2a, o8409, o3144 (see Table SI). PCR products were sequenced 
to determine the exact insertion points. 

For analyses of anthocyanin accumulation, plants were cultured 
on one-half-strength MS-agar medium containing 1% sucrose 
(Valvekens et al., 1988) in a growth chamber at 22°C with 16 h/8 h 
light and dark cycles for 14 days with a light intensity of 40 |jmol of 
photons m"^ s~\ then transferred on one-half-strength MS-agar 
medium containing 12% sucrose for 3 days with a light intensity of 
80 ).imol of photons m"^ s"\ Plants were harvested, immediately 
frozen with liquid nitrogen, and stored at -80°C until use. At least 
three biological replicates were used for anthocyanin analysis. 

Chemicals 

Chemicals of the highest grade commercially available were used 
unless specifically noted. Flavonoid standards were purchased from 
Extrasynthese and AnalytiCon. UDP-p-L-arabinose and UDP-cx-d- 
xylose were purchased from CarboSource Services (supported in 
part by National Science Foundation-Plant Cell Wall Biosynthesis 
Research Network grant 0090281). 

Independent component analyses 

ICA is based on the assumption that a given gene expression level is 
determined by a linear combination of some independent compo- 
nents corresponding to biological signals. Assuming that an 
expression data matrix could be denoted as an m x n matrix X with 
rows and columns representing m genes and n samples, respec- 
tively, and could be considered to be a linear combination of ICs (i.e. 
m X kgene signature matrix S), we can describe X = SA where A 
denotes a latent mixing matrix {kx n latent vectors of the gene 
expression data) (Figure SI). Here we assumed <r ICs. Rows of S (i.e. 



ICs) are statistically independent from each other in ICA. ICA was 
carried out using fastICA algorithm, which is based on a fixed point 
algorithm for seeking a maximum of non-Gaussian properties of the 
components (Hyvarinen and Oja, 1997), with statistical R package 
'fastICA'. Pre-processed expression data, 'GeneExp_v3' file, from 
ATTED-II website (http://atted.jp/) consisting of 1388 Affymetrix 
ATH1 GeneChips were used for the analyses. These data were 
originally from TAIR AtGenExpress and were normalized by robust 
multichip average (Irizarry et al., 2003). For simplicity, we selected 
genes associated with AraCyc metabolic pathways using flat 
file, ftp://ftp.arabidopsis.org/Pathways/OLD/aracyc_dump.20091014. 
The array element mappings of Affymetrix probe set identifiers to 
AGI locus table from TAIR dated 29 July 2009 (affy_ATH1_ 
array_elements-2009-7-29.txt) was used. The resulting matrix size is 
1388 samples x 1877 genes. After applying the fastICA algorithm 
with /f components to be extracted (in this study, k= 8), the data 
were subjected to hierarchical cluster analysis (MCA) with the ICs 
using correlation (uncentered) and average linkage methods. The 
HCA was performed by Cluster 3.0 (de Hoon ef al., 2004) and was 
visualized by JavaTreeView (http://jtreeview.sourceforge.net/). 

Phylogenetic analysis 

UGT protein sequences were aligned by ClustalW implemented in 
MEGA4 (version 4.02; http://www.megasoftware.net/) (Tamura ef al., 
2007). A phylogenetic tree was constructed with the aligned UGT 
protein sequences by MEGA4 using the neighbor-joining method 
(Saitou and Nei, 1987) with the following parameters: Poisson cor- 
rection, complete deletion, and bootstrap (1000 replicates, random 
seed = 64238). The alignment data are available in the Supple- 
mentary material online (Data S2). 

Anthocyanin profiling by HPLC/PDA/ESI-MS 

Anthocyanin extraction was carried out in triplicate as described 
previously (Tohge et al., 2005). For anthocyanin profiling, Agilent 
HPLC 1100 series and Agilent single quadrupole LC-MS 6120 series 
(Agilent Technologies Inc., http://www.home.agilent.com/) were 
used with an Atlantis® T3 column (04.6 mm x 150 mm, 5 \xm. 
Waters) at a flow rate of 0.5 ml min"^ at 30°C. Anthocyanins were 
separated with solvent A (10% acetonitrile, 0.1% trifluoroacetic acid 
in water) and solvent B (90% acetonitrile, 0.1% trifluoroacetic acid in 
water) using an elution gradient (0 min, 0% B; 40 min, 40% B, 
40.1 min, 100% B; 45 min 100% B; 45.1 min, 0% B; 50 min, 0% B). 
PDA was used for the detection of UV-visible absorption in the 
range of 200-600 nm. A mass analyzer was used for the detection of 
anthocyanin glycosides [Mj^ and the peak of fragment ions in a 
positive ion scanning mode with the following setting: drying gas 
temperature, 350°C with drying gas flow of 12 L/min; capillary 
voltage, 4.0 kV; nebulizer pressure, 35 psig; fragmentor, 80 V; 
detection mode, scan (m/z 100-1400). 

Evaluation of T-DNA and transposon Insertion mutants 

For complementation tests, the full-length UGT79B1 coding region 
was amplified by PCR using the primers UGT79B1-GWf and 
UGT79B1-GWf (Table SI). A full-length cDNA clone of UGT84A2 
(pda08060) was obtained from the RIKEN BioResource Center Ara- 
bidopsis full-length cDNA collection (Seki et al., 1998, 2002). The 
full-length UGT84A2 was amplified by PCR using the primers 
UGT84A2-GWf and UGT84A2-GWf (Table SI). Amplified fragments 
were cloned into the pENTR/D-TOPO vector (Invitrogen, http:// 
www.invitrogen.com/) as an entry vector and sequenced to confirm 
the absence of PCR errors. pB2GW7 was used as a destination vector 
and the LR reactions for the binary vector pKYS390for UGT79B1 and 
pKYS399 for UGT84A2 were catalyzed by the Gateway LR clonase 



© 2011 The Authors 

The Plant Journal © 2011 Blackwell Publishing Ltd, The Plant Journal, (2012), 69, 154-167 



Anthocyanin modification in Arabidopsis 165 



enzyme mix (Invitrogen). Transformation into Agrobacterium and 
Arabidopsis, and the selection of transformants were carried out as 
described previously (Yonekura-Sakakibara ef al., 2007). 

For analyses of anthocyanin accumulation, plants were cultured 
on one-half-strength MS-agar medium containing 1% sucrose 
(Valvekens et al., 1988) in a growth chamber at 22°C with 16 h/8 h 
light and dark cycles for 14 days with a light intensity of 40 (.imol of 
photons m"^ s"\ then transferred to one-half-strength MS-agar 
medium containing 12% sucrose for 3 days with a light intensity of 
80 ).imol of photons m"^ s~\ Plants were harvested, immediately 
frozen with liquid nitrogen, and stored at -80''C until use. At least 
three biological replicates were used for anthocyanin analysis. 

General molecular procedures 

The molecular procedures used were as described previously 
(Yonekura-Sakakibara eta/., 2004) unless otherwise specified. RT- 
PCR was performed as described previously (Yonekura-Sakakibara 
etal., 2004) with primers UGT79B1-RTf and UGT79B1-RTr for 
UGT79B1, UGT84A2-RTf and UGT84A2-RTr for UGT84A2 and TUBf 
and TUBr for tubulin (GenBank™ Accession number AK117431) 
(Table SI). 

Production of recombinant UGT79B1 protein and 
glycosyltransferase assays 

Full-length UGT79B1 was amplified by PGR with the primers 
UGT79B1-IFf and UGT79B1-IFr to construct a protein expression 
vector (Table SI). The PGR product was cloned into pCFinf using 
an In-Fusion Advantage PGR cloning kit (Glontech, http://www. 
clontech.com/). The nucleotide sequence of the resultant plasmid, 
pKYS398, was confirmed as above. Escherichia coli strain KRX 
(Promega, http://www.promega.com/) was used as a host for 
expression of recombinant UGT79B1 protein. Transformed cells 
were grown at 37°G until Aeoo reached 0.5. After the addition of 20% 
(w/v) rhamnose to a final concentration of 0.1% (w/v), cells were 
cultured at 18°Gfor24 h. The cells were collected, and the protein was 
purified as a His fusion according to the manufacturer's instructions. 

The standard enzyme assay reaction mixture (final volume, 50 |.il) 
consisted of 50 mM HEPES-KOH, pH 7.5, 150 [im flavonoid sub- 
strates, and 500 |.lm UDP-sugar. For enzyme assays with anthocya- 
nins as substrates, p-mercaptoethanol was added to a final 
concentration of 5 mM. The mixture was preincubated at 30°C for 
2 min, and the reaction was started by the addition of enzyme. 
Reactions were stopped after 0, 4, 8, 12, or 30 min of incubation at 
30''G by the addition of 50 ^il ice-cold 0.5% (v/v) trifluoroacetic acid/ 
MeOH for flavonols or 50 pi ice-cold 1.0% (v/v) HGI/MeOH for 
anthocyanidins and anthocyanins. Supernatants were recovered by 
centrifugation at 12 000 g for 3 min. Flavonoids in the resultant 
solution were analyzed using a Shimadzu HPLG system with a 
Unison UK-G18 column (2.0 x 150 mm, 3 |jm; Imtakt corporation, 
http://www.imtaktusa.com/) at a flow rate of 0.2 ml/min at 35°G. 
Gompounds were separated with a linear eluting gradient with 
solvent A (0.5% trifluoroacetic acid in water) and solvent B (0.5% 
trifluoroacetic acid in acetonitrile) set according to the following 
profile:0 min,20%B;5 min,20%B;10 min,22%B;10.1 min,100%B; 
15 min, 100% B; 15.1 min, 20% B; 20 min, 20% B. PDA was used for 
the detection of UV-visible absorption in the range of 200-600 nm. 
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Figure SI. A schematic IGA model of gene expression data as a 
strategy for gene discovery. 

Figure S2. Phylogenetic tree and multiple alignment of flavonoid 
UGTs catalyzing glycosyl transfer to a sugar moiety of flavonoid 
glycosides. 

Table SI. Primers used in this study. 

Data SI. Complete hierarchical clustering data of a gene signature 
matrix with 1877 metabolism-related genes based on 8 IGs. Java 
Treeview (http://jtreeview.sourceforge.net) is required for visualiza- 
tion. 

Data S2. The alignment used for construction of the phylogenetic 
tree shown in Figure 2. 
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